Optimizing the performance of Lattice Gauge Theory simulations with Streaming SIMD extensions

نویسنده

Shyam Srinivasan

چکیده

Two factors, which affect simulation quality are the amount of computing power and implementation. The Streaming SIMD (single instruction multiple data) extensions (SSE) present a technique for influencing both by exploiting the processor’s parallel functionalism. In this paper, we show how SSE improves performance of lattice gauge theory simulations. We identified two significant trends through an analysis of data from various runs. The speed-ups were higher for single precision than double precision floating point numbers. Notably, though the use of SSE significantly improved simulation time, it did not deliver the theoretical maximum. There are a number of reasons for this: architectural constraints imposed by the FSB speed, the spatial and temporal patterns of data retrieval, ratio of computational to non-computational instructions, and the need to interleave miscellaneous instructions with computational instructions. We present a model for analyzing the SSE performance, which could help factor in the bottlenecks or weaknesses in the implementation, the computing architecture, and the mapping of software to the computing substrate while evaluating the improvement in efficiency. The model or framework would be useful in evaluating the use of other computational frameworks, and in predicting the benefits that can be derived from future hardware or architectural improvements.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Performance of SSE and AVX Instruction Sets

SSE (streaming SIMD extensions) and AVX (advanced vector extensions) are SIMD (single instruction multiple data streams) instruction sets supported by recent CPUs manufactured in Intel and AMD. This SIMD programming allows parallel processing by multiple cores in a single CPU. Basic arithmetic and data transfer operations such as sum, multiplication and square root can be processed simultaneous...

متن کامل

Lattice QCD Calculations on Commodity Clusters at DESY

Lattice Gauge Theory is an integral part of particle physics that requires high performance computing in the multi-Tflops regime. These requirements are motivated by the rich research program and the physics milestones to be reached by the lattice community. Over the last years the enormous gains in processor performance, memory bandwidth, and external I/O bandwidth for parallel applications ha...

متن کامل

A Performance Evaluation Of Multimedia Kernels Using AltiVec Streaming SIMD Extensions

This paper aims to provide an understanding of performance of multimedia applications that use floating-point computations on recent general-purpose microprocessors using streaming SIMD ISA extensions. We used 8 benchmarks to study the impact of these extensions on general application performance and identify the eventual bottlenecks introduced.

متن کامل

Automatic Generation of Vectorized Fast Fourier Transform Libraries for the Larrabee and AVX Instruction Set Extension

Introduction The discrete Fourier transform (DFT) and its fast algorithms (fast Fourier transforms or FFTs) are among the most important computational building blocks in signal processing and scientific computing. Consequently, there is a number of high performance DFT libraries available including Intel’s Integrated Performance Primitives (IPP), FFTW [6], and libraries generated by Spiral [9, ...

متن کامل

Single Instruction Multiple Data – Not Everything is a Nail for this Hammer

Hardware vendors have been struggling to fight the power and memory wall for decades [1, 2]. Since most of the processing time depends on the number of instructions, number of used registers and dependencies between instructions, but not on the size of a register, independent data items of a vector (i.e., a column) could be processed in parallel. Hence, a silver lining seems to be Single Instru...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1309.0551 شماره

صفحات -

تاریخ انتشار 2013

Optimizing the performance of Lattice Gauge Theory simulations with Streaming SIMD extensions

نویسنده

چکیده

منابع مشابه

Performance of SSE and AVX Instruction Sets

Lattice QCD Calculations on Commodity Clusters at DESY

A Performance Evaluation Of Multimedia Kernels Using AltiVec Streaming SIMD Extensions

Automatic Generation of Vectorized Fast Fourier Transform Libraries for the Larrabee and AVX Instruction Set Extension

Single Instruction Multiple Data – Not Everything is a Nail for this Hammer

عنوان ژورنال:

اشتراک گذاری